Goto

Collaborating Authors

 original model


The friendlier the AI chatbot the more inaccurate it is, study suggests

BBC News

AI chatbots trained to be warm and friendly when interacting with users may also be more prone to inaccuracies, new research suggests. Oxford Internet Institute (OII) researchers analysed more than 400,000 responses from five AI systems which had been tweaked to communicate in a more empathetic way. Friendlier answers contained more mistakes - from giving inaccurate medical advice to reaffirming user's false beliefs, the study found. The findings raise further questions over the trustworthiness of AI models, which are often deliberately designed to be warm and human-like in order to increase engagement. Such concerns are accentuated by AI chatbots being used for support and even intimacy, as developers seek to broaden their appeal.


Towards Unbounded Machine Unlearning

Neural Information Processing Systems

Deep machine unlearning is the problem of'removing' from a trained neural network a subset of its training set. This problem is very timely and has many applications, including the key tasks of removing biases (RB), resolving confusion (RC) (caused by mislabelled data in trained models), as well as allowing users to exercise their'right to be forgotten' to protect User Privacy (UP). This paper is the first, to our knowledge, to study unlearning for different applications (RB, RC, UP), with the view that each has its own desiderata, definitions for'forgetting' and associated metrics for forget quality. For UP, we propose a novel adaptation of a strong Membership Inference Attack for unlearning. We also propose SCRUB, a novel unlearning algorithm, which is the only method that is consistently a top performer for forget quality across the different application-dependent metrics for RB, RC, and UP. At the same time, SCRUB is also consistently a top performer on metrics that measure model utility (i.e.


Text-Guided Attention is All You Need for Zero-Shot Robustness in Vision-Language Models

Neural Information Processing Systems

CLIP), have attracted widespread attention and adoption across various domains. Nonetheless, CLIP has been observed to be susceptible to adversarial examples. Through experimental analysis, we have observed a phenomenon wherein adversarial perturbations induce shifts in text-guided attention. Building upon this observation, we propose a simple yet effective strategy: Text-Guided Attention for Zero-Shot Robustness (TGA-ZSR). This framework incorporates two components: the Attention Refinement module and the Attention-based Model Constraint module.


Provable and Efficient Dataset Distillation for Kernel Ridge Regression

Neural Information Processing Systems

Deep learning models are now trained on increasingly larger datasets, making it crucial to reduce computational costs and improve data quality. Dataset distillation aims to distill a large dataset into a small synthesized dataset such that models trained on it can achieve similar performance to those trained on the original dataset. While there have been many empirical efforts to improve dataset distillation algorithms, a thorough theoretical analysis and provable, efficient algorithms are still lacking. In this paper, by focusing on dataset distillation for kernel ridge regression (KRR), we show that one data point per class is already necessary and sufficient to recover the original model's performance in many settings. For linear ridge regression and KRR with surjective feature mappings, we provide necessary and sufficient conditions for the distilled dataset to recover the original model's parameters. For KRR with injective feature mappings of deep neural networks, we show that while one data point per class is not sufficient in general, $k+1$ data points can be sufficient for deep linear neural networks, where $k$ is the number of classes. Our theoretical results enable directly constructing analytical solutions for distilled datasets, resulting in a provable and efficient dataset distillation algorithm for KRR. We verify our theory experimentally and show that our algorithm outperforms previous work such as KIP while being significantly more efficient, e.g.